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(57) Abstract 

Methods which allow for nutritional improvement 
of plants and plant tissue by increasing the amount of a 
selected amino acid(s) in a seed storage protein involve al- 
tering a naturally-occurring seed storage protein gene. Ol- 
igonucleotides coding for the protein are assembled by 
use of overlapping synthesized DNA sequences. 



PstI 

TAACroCAG ATG GCA AAC ATT TCT GTG GTT GCT GOT GCA CTA CTG GTC 48 
net Ala Asn lie Ser Val Val Ala Ala Ala Leu Leu Val 
1 5 10 

TTG CTG GTG TTG GGT CAT GCC ACT GCA AGO ATC TAC AGG ACA GTT GTG 96 

Leu Leu Val Leu Gly His Ala Thr Ala Ser lie Tyr Arg Thr Val Val 
15 20 25 

GAG m GAA GftG GAT GAT GCC ACC AAC CCA AIA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys Met Arg 
30 35 40 45 

AAATGCAGAAAGGACTTCCAGAAGGAACAAATOTTC AGA GCT TGC CAA 192 
Lys Cys Arg Lys Glu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
SO 55 60 65 

CAA TOG TTG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TIT GAC 240 
GIr Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
70 75 80 85 

TTT GAA GAT GAC ATG GAG AAT CCT CAA GGA CCA CAG CAG AGA CCT CCT 288 
Phe Glu Acp Asp Het Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 

CTC CTT CAG AAG TGC TCT GAG CAA CTC AAA CAG ATG CAA TCT CAG TGT 336 
Leu Uu Gin Lys Cys Cys Glu Gin Leu Lys Gin Ket Gin See Gin Cys 
105 110 lis 



GTT TGC CCA ACC CTT AAA GGT GCC AGO AAA GCT GIG AAA CAG GAA GAG 

Val cys Pro Thr Leu Lys Gly Ala Sec Lys Ala val Lys Gin Glu Glu 
120 125 130 



384 



CAGCAACAAGGCOVCCAACAAGGTAAGCAGCAGATGGTTAGGAAGATC 432 
Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Het Val Arg Lys He 
135 140 145 

TAT AAG ACT GCC AAA CAC CTT CCT AAA CTTC TCT GAC ATT CCA CAG CTT 480 
Tyr Lys rtxr Ala Lys His Leu Pro Lys val Cys Asp He Pro Gin Val 
150 155 160 165 

EcoRI 

GAT GtA TGC CCA TTT CAG AAG ACC ATG CCT GGG CCC TCA TAC TAGAATT 529 
Asp Val Cys Pro Phe Gin Lys Thr Met Pro Gly Pro Ser Tyr 
170 175 

CAAT 533 
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PROCESS FOR ENHANCING THE CONTENT OF A SELECTED AMINO ACID 
IN A SEED STORAGE PROTEIN 

5 

Technical Field 

The present invention relates to methods of producing 
10 transgenic plants having an increased content of selected 
amino acids in modified seed storage proteins and, more 
particularly, to methods of making an improved seed storage 
protein, 

15 Background of the Invention 

Greater recognition of the role of plants in supplying 
essential amino acids to the animal world has led to 
emphasis on the development of new food plants that have 

20 proteins that are better balanced for human and animal 
nutrition. Classical plant breeding techniques have limi- 
tations for achieving this goal. Molecular genetics, how- 
ever, shows potential for overcoming these limitations. 

Seed storage proteins represent up to 90% of total seed 

25 protein in many plant seeds. Shotwell and Larkins (1989) 
In: The Biochemistry of Plants Vol. 15 (Academic Press, San 
Diego: Stumpf and Conn, eds.) Chapter 7: 29. These 
naturally-occurring proteins are used as a source of 
nutrition for young seedlings for the growth period just 

30 following germination. The genes encoding them are strictly 
regulated, being expressed in a highly tissue-specific and 
developmentally stage-specific fashion. Walling, et al. 
(1986) Proc. Natl. Acad. Sci . 83, 2123-2127; Higgins, T.J.V. 
(1984) Ann. Rev. Plant Physiol. 35, 191-221. Thus they are 

35 expressed almost exclusively in developing seed, and 
different classes of seed storage proteins may be expressed 
at different stages in the development of the seed. 

The expression of foreign genes in plants is well 
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established. De Blaere, et al . (1987) Methods in Enzymology 
153, 277. Seed storage protein genes have been transferred 
to other plants. Okamura, et al . (1986) Proc. Natl. Acad 
Sci^ 83, 8240; Sengupta-Gopalan, et al. (1985) Proc. Natl 



Sci. 82, 3320; Higgins, et al. (1988) Plant Mol. Biol 



10 



15 



11, 683; Ellis, et al. (1988) Plant Mol. Biol. 10, 203. 
Barker, et al. (1988) Proc. Natl. Acad. Sci. 85, 458; 
Vandekerckhove, et al. (1989) Bio/Technol. tT 929; and 
Altenbach, et al. (1989) Plant Mol. Biol. 13, 513. in most 
of these cases it was shovm that within its new environment, 
the transferred seed storage protein gene is expressed in a 
tissue-specific and developmentally regulated manner. 
Beachy, et al. (1985) EMBO J. 4, 3047. The expression 
levels varied, but reached as high as 8% of the total seed 
protein. Altenbach, et al., supra ; Voelker, et al. (1989) 
Plant Cell 1. 95. 

However, design of a synthetic seed storage protein 
requires more than mere substitution of the desired amino 
acid for naturally-occurring amino acids in the target 
20 protein. Criteria must be defined for maximizing the 
potential of success and the ultimate expression of the gene 
in the targeted host plant. Even selection of the class of 
storage proteins least likely to present difficulties is 
important, and is dependent on the availability of sequence 
25 data for that class of proteins, the relative gene size 
within that class, and the degree of processing and post- 
translational modification necessary for deposition. Seed 
storage proteins are nominally classified by density 
gradient sedimentation values: 2S, 7S, and lis. Although 
30 the 7S and lis proteins tend to be one general type per 
sedimentation value, the 2S seed proteins are a diverse 
group. The 2S sedimentation value implies a relatively low 
molecular weight, and the 2S proteins include classic 
storage proteins as well as lectins, protease inhibitors, 
35 and others. The 2S storage proteins appear to be less 
restricted in amino acid composition than 7S and lis 
proteins, and include species which are relatively rich in 
basic amino acids. Additionally, the 2S storage proteins 

- 2 - 
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are encoded on small genes/ making the prospect of 
synthesizing a new 2S gene from oligonucleotides attractive. 

Among published seed protein sequence data, no protein 
incorporating a non-limiting amount of lysine has been 
5 identified. Lysine comprises from 3 to 7% of the total 
amino acids in known seed protein sequences. it is 
estimated that a protein containing 10 to 15% lysine, 
expressed transgenically at a level of 2 to 5%, is necessary 
to cause a noticeable increase in seed deposition of lysine. 

10 No storage protein-coding sequence which meets this 
criterion is known. 

Storage proteins can be modified by incorporating 
inserts containing one or more selected amino acids such as 
lysine, resulting in a lysine-rich polypeptide that can be 

15 transferred into plant cells. Or, following the design of a 
storage protein with a known sequence, a lysine-rich poly- 
peptide can be synthesized by substitution of specific amino 
acids and transferred into a host cell. 

There is a recognized need for lysine-rich seed storage 

20 proteins and for an efficient, accurate method of producing 
the same. Further, there is also a recognized need for a 
method to produce a DNA or cDNA sequence that codes for an 
increased amount of any essential amino acid that can be 
expressed transgenically as a seed storage protein. A DNA 

25 "coding sequence" is a DNA sequence which is transcribed and 
translated into a polypeptide in vivo when placed under the 
control of appropriate regulatory sequences. The boundaries 
of the coding sequence are determined by the ATG start codon 
at the 3' terminus. Examples of coding sequences include 

30 cDNA, genomic DNA sequences from cells, and synthetic DNA 
sequences'. 

When designing sequences to be rich in certain amino 
acids, care must be taken that the substitutions with the 
selected amino acids does not influence the stability of the 
35 modified 28 protein. Certain insertions, such as, long 
stretches of particular amino acids, may result in shapes 
and turns which would cause instability, poor expression, or 
poor accumulation due to disruption in normal folding 
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patterns of the protein. In addition, replacement must be 
conservative in that hydrophobic amino acids and those 
giving charge and polarity are not substituted so that the 
overall structure and stability of the molecule will not be 
5 adversely affected. Polarity and direction are due to 
acidic (negative) and basic (positive) charges on the amino 
acid residues. 

TO synthesize DNA molecules, the two complementary 
strands are constructed separately because only single- 

10 stranded DNA (oligonucleotides) can be synthesized. These 
are then hybridized (by formation of hydrogen bonds) and 
linked to larger DNA units by enzymatic coupling in order to 
construct genes or their regulatory units. A gene is a DNA 
sequence responsible for the production of polypeptides, it 

15 is now possible, given the various DNA recombination tech- 
niques, to construct any given gene, whether synthetic or 
natural, to reproduce it, and to convert it into polypep- 
tides using whole cell systems. 

Oligonucleotides are oolvmers hniif i,r^ k„ h 1 

. -J, t'uj.y uuu— 

20 densation of nucleoside phosphates. m the past, the 
majority of synthetic genes Have been assembled using 
complementary oligonucleotides which represent both entire 
strands. Gapped fill-ins refer to the pairing of 
complementary nucleotides along sections of DNA where 
25 pairing is incomplete (single-stranded sections) to form 
complementary DNA strands for those segments. Gapped fill- 
ins have been published only for single pairs of overlapping 
oligonucleotides, which limits the length of- the target 
molecule. Thus, construction of long synthetic sequences 
30 required subcloning (moving a sequence from one vector to 
another to produce copies) and/or pasting together of 
regions via restriction sites. The only method utilizing an 
overlap extension procedure requires splicing of double- 
stranded gene fragments. Horton, et al. (1989) Gene 77. 61. 
35 The sequential extension method presented by this 

invention obviates the subcloning requirement, and allows 
simple, one day assembly of larger gene regions. This 
method is approximately 301 more cost-effective even without 
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consideration of personnel time than the usual method of 
assembling complete complementary oligonucleotides because 
it allows enzymatic synthesis of gap regions. Kho;:ana 
(1968) Pure Appl> Chem. 17, 349. A more recent publication 
5 offers similar cost savings by incorporation of a terminal 
3' hairpin structure to prime synthesis of the second 
strand. However that method is limited by the length of 
oligonucleotides. Ulhmann, et al. (1988) Gene 11, 29. 
Another method utilizes short overlap regions to prime 

10 polymerase, but both of these methods rely on ligation of 
separate double-stranded regions for assembly. Rink, et al. 
(1984) Nucleic Acid Res. 12, 16; Rossi, et al . (1982) 
Biol. Chem. 257, 9226. A third method relies on in vivo gap 
repair, and requires that one strand of synthetic DNA be 

15 complete, though it may contain nicks bridged by short 
oligonucleotides of the opposite strand. It has only been 
used to assemble a 270 bp fragment. Adams, et al . (1989) 
Nucleic Acid Res. 16, 4287. 

The advantages of this invention are: (a) it is cost 

20 effective because fewer oligonucleotides are required and 
less time is spent in oligonucleotide preparation because 
crude oligonucleotides work well; (b) it is a simple two- 
reaction (extension/amplification) procedure that is 
complete in 1-2 days; (c) it does not require that 

25 restriction sites for assembly by ligation be included in 
gene design, hence no unnecessary mutations are introduced; 
(d) it enables rapid inclusion of degenerate oligonucleotide 
regions if desired, without separate assembly or cloning 
reactions; and (e) it enables the assembly of chimeric genes 

30 without the introduction of mutagenic restriction sites, 
i.e., it enables "perfect" promoter-gene fusions. 

The present invention further provides improvements in 
the nutritional value of edible organisms, including, but 
not limited to, higher plants. In particular, the present 

35 invention provides for the assembly of synthetic oligo- 
nucleotides by means of overlapping sequences, including the 
nucleic acid sequences encoding the lysine-rich proteins. 

In one embodiment, the present invention provides 
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nucleic acids in the form of a dna molecule, which encode 
one or more subunits of a lysine-rich (approximately 141) 2S 
seed storage-type protein, other isoforms will be at least 
about 80% homologous at the amino acid sequence level to 
this representative member, preferably at least about 85% 
homologous, and more preferably at least about 90% 
homologous . 

in a further embodiment, the present invention provides 
a cell comprising a replicon containing the chemically- 
synthesized, lysine-rich 2S storage protein combined with a 
promoter which includes regulatory sequences that provide 
for the expression of said protein in said cell, said 
subunit being heterologous to said cell. m particularly 
preferred embodiments, the cellular host is a higher plant 
15 or animal cell. 



10 



25 



Brief De scription of the Figures 

Figure 1 shows the complete nucleic acid sequence of a 
20 2S seed storage protein with increased lysine content. The 
double-stranded molecule is cleaved with restriction enzymes 
(Pstl and EcoRI) at bases as indicated to allow cloning. 

Amino acid residues are numbered beneath the sequence 
Mature protein is comprised of residues 39-74 (small 
subunit) crosslinked via S-S bonds to residues 85-170 (large 
subunit). Residues 1-38 constitute a signal sequence and N- 
termmus processed site. Residues 75-84 constitute a 
"Unker" type peptide, which is excised as the protein 
folds. Residue 171 is a carboxy-terminal residue which is 
also excised at protein maturity. 

Figure 2 shows the oligonucleotides used in construc- 
tion of the 2S seed protein and their design as pairs shar- 
ing complementary overlap regions of 17-24 nucleotides. 
Each pair has a similar overlap with the adjacent pair 
35 Figure 3 shows the first and second sequential 

extension products that are formed as the six extensions are 
implemented. 

Figure 4 shows the third, fourth, fifth, and sixth 

- 6 - 
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sequential extension products that are formed as the six 
extensions are completed. 

Disclosure of the Invention 

5 

In addition to the techniques described below, the 
practice of the present invention will employ conventional 
techniques of molecular biology, microbiology, recombinant 
DNA technology, and plant science, all of which is within 

10 the skill of the art. Such techniques are explained fully 
in the literature. See, e.g., Maniatis et al., Molecular 
Cloning: A Laboratory Manual (1982); DNA Cloning: Volume 
I and II (D.N. Glover, ed., 1985); Oligonucleotide Synthesis 
(M.J. Gait, ed., 1984): Nucleic Acid Hybridization (B.D. 

15 Hames & S.J. Higgins, eds., 1985); Transcription and Trans - 
lation (B.D. Hames & S.J. Higgins, eds, 1984): Animal Cell 
Culture (R.I. Freshney, ed., 1986); Plant Cell Culture (R.A. 
Dixon, ed., 1985); Propagation of Higher Plants Through 
Tissue Culture (K.W. Hughes et al., eds., 1978); Cell 

20 Culture and Somatic Cell Genetics of Plants (I.K. Vasil, 
ed., 1984); Fraley et al. (1986) CRC Critical Reviews in 
Plant Sciences 4, 1; Biotechnology in Agricultural 

Chemistry: ACS Symposium Series 334 (LeBaron et al . eds. 
1987) the disclosures of which are well-known and are hereby 

25 incorporated herein by reference. 

The design of the prototypical synthetic plant gene 
herein is based on published regulatory sequences, including 
reported enhancer (repetitive) regions found uniquely in 
seed storage genes. In addition, computer modeling of both 

30 hydropathy and evolutionary relatedness of known seed 
proteins was used in the planning of potential coding 
sequences, as well as inclusion of codon biases found in 
published storage protein gene sequences. 

With respect to the choice of the regions to be modi- 

35 fied, the present invention varies significantly from other 
work which has been done in this field. European Patent 
Application No. 318, 341 describes a method for replacement 
or supplementation of the hypervariable region of a 2S 
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albumin gene. Based on a model of the Arabidopsis thaliana 
2S albumin, the hypervariable region is defined as a section 
of the large subunit of the protein between the sixth and 
seventh cysteine residues where little conservation of amino 
acids is observed. a non-conserved region is a region 
wherein the nucleotide sequence can be modified either by 
insertion into it or replacement of a nucleotide sequence 
which, at least in part, may. be foreign to the natural 
nucleic acid encoding the precursor of the 2S albumins of 
the plant cells concerned and encodes the appropriate amino 
acads, without disturbing the stability and correct process- 
ing of the storage protein or its transport into parts of 
the cell. The modification procedure is called site- 
directed mutagenesis. 

The synthetic gene sequence was constructed by the 
general process of sequential extension of overlapping 3' 
ends using dna polymerase. The sequence was designed to be 
assembled from six pairs of synthetic oligonucleotides 
(partial sequences), each having 3' overlap within the pair, 
as well as 3' overlap between adjacent pairs. Assembly is 
comprised of three parts: filling in pairs to create double- 
stranded segments; combining all duplexed segments and 
sequentially extending to form a small number of full length 
genes; and amplifying (PCR) complete molecules to a quantity 
25 sufficient for cloning. It is an efficient and streamlined 
procedure, useful for constructing large genes with little 
or no possibility of misjoinder and without the need for 
intermediate vectors. Numerous pairs of partial sequences 
can be used to assemble large synthetic genes. There is no 
30 limit to the size of predetermined gene structure that this 
synthetic strategy will allow. Accordingly, it is 

anticipated that this invention will find important 
utilization by those skilled in the art. 

in one embodiment, each pair is filled in by combining 
35 two (paired) oligonucleotides (100 pmol each) in a suitable 
solution for bonding, comprising 15 //M each dNTP, 40mM Tris- 
Cl PH7.5, 20mM MgCl^, 50 mH NaCl , and lOmM DTT (25 Til final 
volume). The oligonucleotide mix is heat denatured (950C) 



20 
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and allowed to anneal by slowly cooling to room temperature. 
Heat-sensitive DNA polymerase (examples: E. coli "Klenow", 
Sequenase {Registered: US Biochemical}) is added (1.5 U) and 
the reaction allowed to proceed 10 minutes at room 
5 temperature. Alternatively r heat-stable polymerase (e.g., 
Taq polymerase) may be substituted if the buffer is replaced 
with 50mM KCl, lOmM Tris-Cl pH8.3, l.BmM MgCl^, -01% BSA, 
and the reaction mix annealed at 55*»C and extended at 72^C. 
Sequential extension of these pairs is accomplished by 

10 combining aliquots of each of the above reactions, adding 
sufficient dNTPs, and sequentially heating, reannealing, and 
extending in the presence of polymerase. This is easily 
accomplished using Taq polymerase and commercially available 
heat cycling blocks (e.g., DNA Thermal Cycler {Perkin- 

15 Elmer/Cetus} ) , and requires buffer adjustment as noted 
above. Heat-labile polymerase may be substituted, but 
requires manual transfer of tubes between heat blocks of 
suitable temperature. The number of cycles required to 
generate full length sequences is dependent on the number of 

20 duplexed components, and is minimally half that number. To 
generate sufficient full length molecules to allow gel 
detection, the molecules must be cycled a greater number of 
times. In the example from the previous paragraph, the 
partial sequences were sequentially extended for a total of 

25 12 cycles in order to discern full length molecules. 
Obtaining a clonable amount of this gene sequence is 
possible using PGR, and requires only a small portion (2%) 
of the sequential extension reaction as template. 

30 

Modes for carrying out the Invention 
Example 1 

35 Design of the protein 



A putative 23 seed storage protein sequence was derived 
from published protein sequences. Crouch, et al . (1983) J. 
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Hol. Appl ,^ 2, 273; Ericson, et al. (1966) a. Biol. 
£hem^261, 14576; Altenbach, et al. (1987) Plant noT^ 
8, 239; Krebbers, et al. (1988) Plant Physiol. »7 ».o> 
and by using peptide sequence data from various Brassica 
spp. obtained in this laboratory (unpublished)." These 
members of the 2s niaec „<: „ , 

class of seed storage proteins are 

synthesized as precursor polypeptides of 15-21 kDa and 
undergo a number of processing steps to yield the stored 
protein, comprised of a large and a small subunit of 
10 combined MW of 9-17 kDa. The proposed protein sequence 
(Figure 1) includes all processing regions typical of such 
2S seed proteins. The first 22 amino acids should function 
as a transit peptide to direct protein inclusion in storage 
bodies (Chrispeels, et al. 1982 J. Cell Biol. 93:306). m 
15 addition to the first 22 amino acids, residues 23^38 75-84 
and 171 are those amino acids which should be delet-^ i in the 
final stored product by processing steps typical of vhese 2S 
seed proteins. The accumulated protein should th = be two 
subunits of 4.4 kDa frociri,,.,^ t,o 
0 nn, ^"^ ''"^ ^'^ '^Da (residues 

85-170). Codons were selected for the synthetic gene based 

shownr"'' " ""'"'"^ 



5 



Example 2 
Synthe sis of oligonucleotide.'; 

Oligonucleotides from 56 to 69 nucleotides in length 
were synthesized on an Applied Biosystems Model 380B 
synthesizer, deblocked, treated with ammonia at 50°c 
vacuum-dried and resuspended in water. The oligonucleotides 
were used with no further purification. 

Oligonucleotides used in this construction were 
designed as pairs sharing complementary overlap regions of 
17-24 nucleotides, each pair having a similar overlap with 
the adjacent pair (Figure 2). Following denaturation and 
annealing with all oligomers present in the reaction, mole- 
cules Of the most stable duplex structure formed, and 
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allowed extension of the duplex from the overlaps. Repeti- 
tion of such extensions produced successively longer mole- 
cules, hence progressively larger regions of complementa- 
tion. Sequential extension products are shown schematically 
5 in Figure 3. The first extension reaction can yield only 
those products shown, and required polymerase fill-ins of 
37-51 nucleotides from overlap regions of 17-24 base pairs 
in the claimed synthetic gene. The second round of exten- 
sion must also proceed from minimal overlaps (17-18 base- 

10 pairs), with the addition of 79-102 nucleotides to the com- 
plementary regions. Beginning with the third extension, 
progressively larger overlaps were available. Only the 

longest, hence most stable duplex conformations, are shown 
in Figure 3. At the end of the third extension reaction 

15 some completed molecules were present in the reaction. A 
total of six extensions increased the probability of obtain- 
ing complete sequences. 

Example 3 

20 

Amplification 

An aliquot of the extension products served as a 
template for in vitro amplification using distal 5' and 3' 
25 oligonucleotides (oligos 1+ and 6-) as primers. Both the 
Taq polymerase and the T7 DNA polymerase extension reactions 
yielded single Taq amplification products of the expected 
530 bp. 

30 Example 4 

Cloning and expression 

The amplification products of Example 3 were gel puri- 
35 fied, cut at the PstI and EcoRl sites included at the 5' and 
3' ends of the synthetic sequence, and cloned into similarly 
digested pTZlOu. Recombinant plasmids were transfected into 
DH5a and plated on selective media containing x-Gal. White 
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colonies were selected for mini-preps of DNA, and screened 
for the presence of the 206 bp Bgl2 fragment. Six of the 
Taq-extended clones and seven of the T7-extended clones were 
sequenced completely at least once in each direction, and 
5 the sequence analysis results are shown in Table 1. one of 
Six clones from Tag extension and one of seven clones from 
T7 extension contained perfect constructs. The clones from 
the Taq extension contained a total of 10 induced single 
base pair mutations: 6 substitutions, 3 deletions and one 
10 insertion. The sum mutation rate with Tag extension was 
thus 10/(6x530) or 1 mutation per 318 nucleotides. T7 
extensions contained considerably more mutations, including 
10 substitutions, one insertion and 3 deletions of 2, 3 and 
9 base pairs. The sum mutation rate with T7 polymerase 
15 extensions was thus 25/(7x530) or 1 mutation per 148 nucleo- 
tides. 

Mini plasmid preps used to screen for the Bgl2 fragment 
were digested with EcoRI and PstI, Southern blotted and 
examined by hybridization to a probe prepared from the com- 
20 plete insert of the correct synthetic gene clone, pTL315 
It was found that of clones produced by Tag extension, only 
those possessing the Bglii fragment contained any portion of 
the synthetic gene. However, 24 clones obtained through T7 
extension contained some portion of the synthetic gene, and 
25 only six of these included the predicted Bglii fragment 
More amplification products result from the T7 extension 
mixes than from those of Taq. it is likely that the lower 
temperature (37») used for T7 extensions allowed more mis- 
matches during annealing and extension than that allowed 
30 during the Tag (72°) extensions. 
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Table 1 

Clones selected from sequential extensions 

Designation Enzyme used Mutation Location^ 

in extensions 



pTLBlO 


Xaq 


A— >T 


OL(S) 


pTL314 


Taq 


G deletion 


Fl 


pTL315 


Tag 


none 




pTL332 


Taq 


A deletion 


OL(S) 


pTL333 


Taa 


A->G 


OL{F) 


pTL340 


Taq 


A deletion 


Fl 






C insertion 


Fl 


pTL344 


Taa 


G->T 


Fl 






A->C 


Fl 






CA->AC 


Fl 


pTl410 


Sequenase 


CA deletion 


OL(S) 






T->C 


OL(S) 


pTL414 


Sequenase 


T-<G 


' Fl 






A insertion 


Fl 


pTL423 


Sequenase 


C->T 


Fl 






T-<G 


OL(F) 


pTL459 


Sequenase 


none 




pTL478 


Sequenase 


GTG deletion 


Fl 






T->C 


Fl 


pTL652 


Sequenase 


A->G 


Fl 






G->A 


Fl 


pTL657 


Sequenase 


9 bp deletion 


Fl 






A->G 


OL(S) 






T->C 


Fl 






C->A 


Fl 



^OL(S) roverlap during first sequential extension 
OLtF) roverlap during paired oligonucleotide fill-in 
Fl:fill-in region during either of the above reactions 
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Industrial Applicability 

Directly or indirectly, animals obtain their essential 
amino acids (those they are unable to synthesize) from eat- 
5 ing plants. Most seeds, the major plant protein sources, 
are deficient in one or more amino acids essential for 
proper nutrition of higher animals. Dicotyledonous seeds, 
such as legumes, generally lack sufficient sulfur-containing 
amino acids (cysteine and methionine), while monocotyle- 
10 donous plants (cereals) typically lack adequate lysine, as 
well as tryptophan and threonine. Plants can serve as ade- 
quate amino acid sources if complementary seeds (e.g., rice 
and beans) are ingested simultaneously, and in the proper 
quantity. 

15 Cereals and legumes are combined in this complementary 

way in the formulation of diets for swine. Current feeding 
practices in the United States utilize 85% corn and 15% 
soybean meal in swine diets. The predominance of corn as 

the mainr r^■i^^^^^ 

J w„„.t,„,.o^ mainly to its low cost 

20 and high carbohydrate content. The low protein levels are 

supplemented with soybean meal to provide adequate protein 

nutrition. Because corn is particularly deficient in lysine 

(2%), added soybean, although sufficient in lysine (64%) 

when used as the sole protein source, cannot raise lysine 

25 levels to those necessary for maximum swine growth. Thus 

swine feed is frequently supplemented with "synthetic" 

lysine. current levels of supplemental lysine average about 

1kg per metric ton of feed at a cost of $4.50/kg lysine. 

The U.S. market for lysine (primarily used in feeds) is 

20Mkg, resulting in retail sales of §100M. Strategies to 

reduce this supplementation of lysine include the use of 

newly developed high-lysine (3.3%) corn varieties. These 

varieties may obviate the need for lysine addition to feed 

in the future. However high-lysine varieties have not yet 

been widely accepted by farmers, because they typically show 

poor growth and low yield characteristics. Additionally, 

existing high-lysine corn lines are the result of a 

recessive mutation, which increases the difficulty of 

- 14 - 
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breeding this characteristic into popular varieties. 
Therefore, these varieties of corn are an expensive source 
of high-lysine protein* 

A reasonable alternative is to enhance lysine levels in 
5 corn, soybean, and other crops through introduction of new 
seed storage protein genes. For example, soymeal is a 
component of animal feeds because of its high protein 
quality and content. A modest increase in soy protein 
lysine levels may be of great benefit to the feed market due 
10 to the high quality protein background in soybean. 
Molecular biology now provides the tools to alter amino acid 
composition via gene transfer and provide, through this 
invention, for the nutritional enhancement of soybeans and 
other crops. 

15 
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Sequence Listing 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Barbara Ballo 

(ii) TITLE OF INVENTION: Process for Enhancing the 
content of a Selected Amino Acid in a Seed storage 
Protein ^ 

(iii.) NUMBER OF SEQUENCES: 13 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pioneer Hi Rt-o/i t *. 

rioneer Hi-Bred International, 

Inc. 

(B) STREET: 700 Capital Square 

400 Locust Street 
CITY: Des Moines 

(D) STATE: lowa 

(E) COUNTRY: United States 

(F) ZIP: 50309 
(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette - 3.5 inch, 

720 kb storage 

(B) COMPUTER: ibm Compatible 

(C) OPERATING SYSTEM: MS-DOS 

(D) SOFTWARE: WORDPERFECT 
(vi) CURRENT APPLICATION DATE: 

(A) APPLICATION NO. 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(Viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: - 

Pearlmutter, Nina L. 

(B) REGISTRATION NUMBER: 35,639 

(C) REFERENCE/DOCKET NUMBER: 0215 US 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (515) 245-3596 

(B) TELEFAX: (515) 245-3634 
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(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 533 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 
(ill) HYPOTHETICAL: No 

(iv)' ANTI-SENSE: N/A 
10 (xi) SEQUENCE DESCRIPTION : Seq . ID. No, 
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TAACTGCAG ATG GCA AAC ATT TCT crr f-m> . . 

^ 10 

65 
85 

TTT GAA GAT GAC ATC C3AG Aat rnr 

cz. MP „e. as z s - - - 

100 

ssa^^--ss assess L^-sj.^- 

CTT TCC CCA ACC COT AAA GOT GCC AGC AAA GOT r-TY- . 
Val Cys Pro Ihr Leu Lvs Glv Al a cI; r ^ GAA GAG 

120 ^® tit Gin Glu Glu 

^"^^ 130 

145 

AAG ACT GCC AAA CAC CTT OCT AAA rrr -iv-p n^n .n™ 

jyj I.ys «a Ly. His Leu S ST ^ 2? fJJ ,cc. go 4S0 

165 

45 175 
CAAT 



20 



30 



192 



240 



288 



336 



384 



533 
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(3) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEE^SS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DMA 
(ill) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 
10 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 2 

TAACTGCAGA TGGCAAACAT TTCTCTGGTT GCTGCTGCAC TACTGGTCTT GCTGGTGTTG 60 
15 GGTCATGCC 69 

(4) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 69 bases 

20 (B) TYPE: nucleotide 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 . (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 3 

GGTGGCATCA TCCTCTTCAA ACTCCACAAC TGTCCTGTAG ATGCTTGCAG TGGCATGACC 60 
30 CAACACCAG- 69 
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(5) INFORMATIOJ FOR SEQ ID NO: 4: 
(i) SEQUEWCE CHARACTERISTICS: 

(A) LENGTH: 57 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 4 

<^GGAro ATGCCACCAA CCCAATAGGT CCT^ 5^ 
(6) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQJmCE CHARACTERISTICS: 

(A) LENGTH: 56 bases 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DMA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 

25 (xi) SEQUENCE DESCRIPTION: 

Seg. ID. No, 5 

CCATTCITCG CAAOno^ ACAITTCITC CTI^^ 
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(7) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEM^SS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUEjaCE DESCRIPTION: 

Seq. ID. No. 6 

GAGCTTGCCA ACAATGGTTG AGGAAACAAG CTAGACAAGG AAGATCTGAT GAATTTGAC 59 

15 

(8) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 61 bases 

(B) TYPE: nucleotide 

20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 

25 (xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 7 

GGTCTCTGCT GTGGTCCTTG AGGATTCTCC ATGTCATCTT CAAAGTCAAA TTCATCAGAT 60 

30 

C 61 
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(9) INraRMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 bases 

(B) TYPE: nucleotide 

5 (C) STOANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

l.p (xi) SEQUENCE DESCRIPTION: 

Seg. ID. No. 8 

A^^CCTC^ TCTCCTZO^ 

(10) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 64 bases 

(B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOIOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: Yes 
(xi) SEQUENCE DESCRIPTION: 



20 



Seg. ID, No. 9 

GCTC 

64 
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(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

Seq, ID. No. 10 

AAGGTGCCAG CAAAGCTGTG AAACAGGAAG AGCAGCAACA AGGCCAGCAA CAAGGTAAGC 60 
15 AGCAG 65 

(12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 56 bases 

20 (B) TYPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 11 

30 GGAAGGTGTT TGGCAGTCTT ATAGATCTTC CTAACCATCT GCTGCTTACC TTGTTG 56 
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(13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 bases 

(B) TYPE: nucleotide 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI-SENSE: No 

10 (xi) SEQUENCE DESCRIPTION: 

Seg. ID. No. 12 

15 GACTGCCAAA CACCTTCCTA AAGTCTGTGA CATTCCACAG GTTCATCTAT GCCCATTTC 59 
(14) INFORMATICS FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 57 bases 

20 (B) TVPE: nucleotide 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: synthetic DNA 

(iii) HYPOTHETICAL: No 
25 (iv) ANTI-SENSE: Yes 

(xi) SEQUENCE DESCRIPTION: 

Seq. ID. No. 13 

30 ATTGAATTCT AGTATGAGGG CCCAGGCATG GTCTTCTGAA ATCGGCATAC ATCAACC 57 
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WHAT IS CLAIMfiD IS: 

1, A method of making an improved seed storage protein 
by altering a naturally-occurring seed storage protein hav- 
5 ing a known amino acid sequence to increase its content of a 
selected amino acid, comprising the steps of: 

a. identifying conserved, non-conserved and hyper- 
variable residues in the amino acid sequence of the 
10 naturally-occurring protein by comparison of the amino acid 
sequence of the protein with amino acid sequences of other 
homologous seed storage proteins; and 

b* replacing one or more non-conserved DNA 
15 residues coding for the protein with DNA residues coding for 
the selected amino acid, provided that 

i) the replacement is conservative with 
respect to hydrophobici ty , polarity and charge, and 
20 ii) the replacement does not create any pairs 

of adjacent amino acids which are not found in the 
naturally-occurring seed storage protein or the homologous 
seed storage proteins. 

25 2. A method according to claim 1 comprising the further 

steps of synthesizing a DNA sequence which codes for the 
altered seed storage protein and synthesizing the altered 
seed storage protein by transcription and translation of the 
DNA sequence in a living cell. 

30 

3. A method according to claim 2 wherein the DNA 
sequence is synthesized by site-directed mutagenesis of a 
DNA sequence which codes for the naturally-occurring seed 
storage protein. 

35 

4. A method according to Claim 2 wherein the DNA 
sequence which codes for the naturally-occurring seed 
storage protein is genomic DNA. 
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5 



5. A method accordinq to claim ^ ..u • ^, 

° '-■laim 3 therein the DNA 

sequence which codes for the natural • 

^"^ naturally-occurring seed 
Storage protein is genomic dna. 



0 



6. A method according to Claim 2 wherein the DNA 
sequence is: SEQ ID N0:1 

or; a DNA sequence at least 80% homologous thereto. 

7. A method according to Claim 3 wherein the DNA 
sequence is: SEQ ID N0:1 

or; a DNA sequence at least 80% homologous thereto. 

8. A method according to claim 2 wherein the DNA 
sequence is synthesized by the steps of: 

-•' — « "t ot singl.-stranded partial 

™» ,c,„e„„s capable „£ bei„, assembled in co.pleLnta'y 
=verlapp>„, relaUonship to provide the complete dkI 

zr:ii:i'': ^""-^ 

•na a length of less than about 100 base pairs, each partial 
seguence having 3. and 5. oligonucleotide e„ds„hich" 
complementary to the respective 3. and 5- oligonucleotide 
nds of the partial sequences which are respectively 3- and 
5 to the partial sequence in the complete s.guenc' of the 
altered protein; and 

extend.. ^"""^-^ the partial sequences to produce 

extended sequences consisting of two or more partial 
sequences in complementary overlapping relationship; 

c. filling nucleotide gaps in the extended 
sequences to produce double-stranded extended sequences; 



30 
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d. denaturing the double-stranded extended 

sequences to produce longer sequences consisting of tv/o or 
more partial sequences; and 

5 e. repeating steps (b) through (d) until the 

extended sequence produced by step (c) is the complete DNA 
sequence of the altered protein. 

9. A method of synthesizing a complete DNA sequence 
10 comprising the steps of: 

a. synthesizing a set of single-stranded partial 
DNA sequences capable of being assembled in complementary 
overlapping relationship to provide the complete DNA 
15 sequence, each partial sequence having 3' and 5' ends which 
are complementary to the respective 3' and 5' ends of the 
partial sequences which are respectively 3' and 5' to the 
partial sequence in the complete sequence; and 

20 b. annealing the partial sequences to produce 

extended sequences consisting of two or more partial 
sequences in complementary overlapping relationship; 

c. filling nucleotide gaps in the extended 
25 sequences to produce double-stranded extended sequences; 

d. denaturing the double-stranded extended 
sequences to produce longer sequences consisting of two or 
more partial sequences; and 



e 



repeating steps (b) through (d) until 
the extended sequence produced by step c is the complete DNA 
sequence . 



35 
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pstr 

TAACTCCAG ATC GCA AAC AIT TCT GTC GTT GCT GCT GCA CTA CTC GTC 
Met Ala Asn He Ser val val Ala Ala Ala SS 2S 
= 10 



48 



144 
192 
240 



S S ?2 S S; S ?f ^ 

oay HIS Axa Thr Ala Ser He lyr Arg Thr Val Val 

25 

s;^ s ?s ?s T ^ 

30 ^ ^ lie Gly Pro Lys Met Arq 

" 40 4| 

80 85 

^5 100 
Crc CTT CAG AAG TGC TGT GAG CAA CTC AAA TAP a-iy- r^A 

110 

165 

GAT GTA TGC CCA TIT CAf? AAr; aor nrrv- EcoRI 
va: <^s S S K ^? 



175 
CAAT 

FIGURE 1 
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TAACTGCAGAraSCAAACATTTC^^ 

GACCACAACCCAGTACGGT 



GAAGAGGATGATGCCACCAACCCAATAGGTCCTAAGAT 
GACGTTCGlftGATGTCCTGTCAACACCrCAAAC^^ 



QVGGAAATGCAGAAAGGAG GAGCTTGCCMCAATGGTTGAGGAAACAA 
CCrriACGTCTTTCCTCAAGGTCTTCCTTCTTl^ 



GCXftGACAAGGAAGATCTGATGAATTIV^ GGACCACAGCAGA 
CTAGACIACTTAAACTGAAACTTCTAC^ 



GACCTCCTCTCCTTCAGAAGTGCTGTGAGCAACT 

CTCG CTCGTIOVGTTTGTCTACGTTAGAGTCACAaWVCGG^ 



AAGGTGCCAGCAAAGCTGTGAAACAGGAAGAGCAGCAACAAGGCCAGC^ 
ATTTCCACGGTCGTTTCGAC GTTGTTCCGGTCGTTGTTCCTVTTCGTCGTCTAC 



GACTGCCAAACACCTTCCTAAAGTCnXSTGACATTCCAC^ 
CAATCCTTCTAGATATTCTGACGGTTTGTGGAAGG CCAACTACATACGG 



CAnrc 

GTAAAGrcTTCTGGmCGGACCCGGGAGTAT^ 

FIGURE 2 



2/A 



wo 94/10315 

PCT/US93/I0090 

Sequential extension from overlap regions of oligonucleotides: 
First extension 

> Oligo pair 1+/1- 



> Oligo pair 2+/2- 



" Oligo pair 3+/3- 



> Oligo pair 4+/4- 



> Oligo pair 5+/5- 



> Oligo pair 6+/6- 



Second extension 




Oligo pairs 1+2 

> Oligo pairs 2+3 




Oligo pairs 3 + 4 

> Oligo pairs 4+5 



> Oligo pairs 5+6 



FIGURE 3 
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Fourth extension 



Fifth extension 



Sixth extension 
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-> Oligo pairs 1+2+3 

_ — > oligo pairs 2+3 + 4 

> Oligo pairs 3+4+5 

> Oligo pairs 4+5+6 



-> Oligo pairs 1+2+3+4 



_ > Oligo pairs 2+3+4+5 



>Oligo pairs 3+4 + 5 + 6 



> Oligo pairs 1-5 

> Oligo pairs 2-6 



— > 



FIGURE 4 
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PStI 

TRACTGCAG ATG GOV AAC ATT TCT CTG GTf GCT OCT GCA CTA CTG GTC 
ftet Ala Asn lie Ser Val VaX Ala Ala Ala Leu Leu Val 
1 5 10 
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TTG Cro GTG TTG GGT CAT GCC ACT GCA AGO ATC TAC AGG ACA GTT GTG 96 
Lcu Leu Val Leu Gly His Ala Thr Ala Ser He Tyr Arg Thr Val Val 
15 20 25 

GAG TIT GAA GAG GAT GAT GCC ACC AAC CCA ATA GGT CCT AAG ATG AGG 144 
Glu Phe Glu Glu Asp Asp Ala Thr Asn Pro He Gly Pro Lys net Arg 
30 35 40 45 



AAA TGC AGA AAG GAG TTC CAG AAG GAA CAA ATG TTG AGA GOT TGC CAA 
Lys Cye Arg Lys Clu Phe Gin Lys Glu Gin Met Leu Arg Ala Cys Gin 
50 55 60 65 



192 



CAA T3G TIG AGG AAA CAA GCT AGA CAA GGA AGA TCT GAT GAA TTT GAC 240 
Gin Trp Leu Arg Lys Gin Ala Arg Gin Gly Arg Ser Asp Glu Phe Asp 
70 75 80 85 

TTTGAAGATGACATCGAGAATCCrCAAGGACCACAGCAGAGACCTCCT 288 
Pttt Glu Asp Asp Met Glu Asn Pro Gin Gly Pro Gin Gin Arg Pro Pro 
90 95 100 

CTCCITCAGAAGTGCTCTGACCAACTCAAACAGATCCAATCrCftGTGT 336 
Leu Leu Gin Lys Cys Cys Clu Gin Leu Lys Gin Met Gin Ser Gin Cys 
105 110 115 

GTT TGC CCA ACC CTT AAA GGT GCC AGC AAA GCT GTG AAA CAG GAA GAG 384 
val cys Pro ITir Leu Lys Gly Ala Ser Lys Ala Val Lys Glo Glu Glu 
120 125 130 

CAGCAACAAGGCCAGCAACRAGGTAAGCACCMIATGGrrAGGAAGATC 432 
Gin Gin Gin Gly Gin Gin Gin Gly Lys Gin Gin Ket Val Arg Lys He 
135 140 145 

TATAAGACTGCCAAACACCXTCCrAAAGTCTGTGACATrCCACAGGrr 480 
Tyr Lys Ttir Ala Lys His Leu Pro Lys Val Cys Asp He Pro Gin Val 
150 X55 160 165 

CcoRl 

GAT cm TGC CCA TIT CAG AAG ACC ATG CCT GGG CCC TCA TAC XAGAATT 529 
Asp Val Cys Pro Phe Gin Lys Thr Met Pro Gly Pro Ser Tyr 
170 175 



CAAT 
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